Getting the map for the area of interest

In [2]:
import folium
In [3]:
# Singapore's coordinates according to Google: 1.3521° N, 103.8198° E
sg_map = folium.Map(location=[1.3521, 103.8198], zoom_start=12)
In [4]:
sg_map
Out[4]:

Loading the data into "Dataframes"

Dataframes are a construct in R used to store data

In [1]:
import pandas as pd
In [2]:
fname = "data/resale-flat-prices/resale-flat-prices-based-on-registration-date-from-jan-2015-onwards.csv"
In [3]:
df = pd.read_csv(fname)
In [4]:
df.head()
Out[4]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease resale_price
0 2015-01 ANG MO KIO 3 ROOM 174 ANG MO KIO AVE 4 07 TO 09 60.0 Improved 1986 70 255000.0
1 2015-01 ANG MO KIO 3 ROOM 541 ANG MO KIO AVE 10 01 TO 03 68.0 New Generation 1981 65 275000.0
2 2015-01 ANG MO KIO 3 ROOM 163 ANG MO KIO AVE 4 01 TO 03 69.0 New Generation 1980 64 285000.0
3 2015-01 ANG MO KIO 3 ROOM 446 ANG MO KIO AVE 10 01 TO 03 68.0 New Generation 1979 63 290000.0
4 2015-01 ANG MO KIO 3 ROOM 557 ANG MO KIO AVE 10 07 TO 09 68.0 New Generation 1980 64 290000.0

We filter out entries for just the year 2017 to make plotting a bit faster.

In [5]:
df = df[df["month"].str.contains("2017")].reset_index(drop=True)

Many of the columns are not very useful for the purposes of mapping so we drop them

In [7]:
df = df.drop(["month", "flat_type", "storey_range", 'flat_model', "lease_commence_date", "remaining_lease"],axis=1)
In [8]:
# Look at the number of rows and columns
print(df.shape)
df.head()
(18825, 5)
Out[8]:
town block street_name floor_area_sqm resale_price
0 ANG MO KIO 406 ANG MO KIO AVE 10 44.0 232000.0
1 ANG MO KIO 330 ANG MO KIO AVE 1 68.0 338000.0
2 ANG MO KIO 333 ANG MO KIO AVE 1 68.0 330000.0
3 ANG MO KIO 218 ANG MO KIO AVE 1 67.0 285000.0
4 ANG MO KIO 219 ANG MO KIO AVE 1 67.0 297000.0

Calculating price per square meter & normalization

In [12]:
df["ppsm"] = df.resale_price/df.floor_area_sqm
In [13]:
# Normalization to get prices in a 0-1 range
max_ppsm = df.ppsm.max()
min_ppsm = df.ppsm.min()
df["normalized"] = df.ppsm.apply(lambda x: (x - min_ppsm)/(max_ppsm - min_ppsm)) 
In [14]:
df.head()
Out[14]:
town block street_name floor_area_sqm resale_price ppsm normalized
0 ANG MO KIO 406 ANG MO KIO AVE 10 44.0 232000.0 5272.727273 0.331621
1 ANG MO KIO 330 ANG MO KIO AVE 1 68.0 338000.0 4970.588235 0.293965
2 ANG MO KIO 333 ANG MO KIO AVE 1 68.0 330000.0 4852.941176 0.279302
3 ANG MO KIO 218 ANG MO KIO AVE 1 67.0 285000.0 4253.731343 0.204621
4 ANG MO KIO 219 ANG MO KIO AVE 1 67.0 297000.0 4432.835821 0.226943

Geocoding to get lat lng coordinates

We use the onemap api which geocodes Singaporean addresses: https://docs.onemap.sg/#authentication-service-post

In [15]:
import requests
In [26]:
def geocode(block, streetname):
    url = "https://developers.onemap.sg/commonapi/search?"
    address = block + " " + streetname
    address = address.replace(" ", "%20")
    http_req = url + "searchVal=" + address + "&returnGeom=Y&getAddrDetails=Y&pageNum=1"
    r = requests.get(http_req)
    try:
        lat = r.json()["results"][0]["LATITUDE"]
        lng = r.json()["results"][0]["LONGITUDE"]
    except:
#         print(http_req)
#         print(r.json()["results"])
        return 0, 0
    return float(lat),float(lng)

Plotting coordinates with colored markers according to normalized ppsm

In [27]:
import colorsys
from tqdm import *
In [28]:
for i in tqdm(range(df.shape[0])):
    blk = df.iloc[i]["block"]
    streetname = df.iloc[i]["street_name"]
    lat, lng = geocode(blk,streetname)
    # Ignore geocoding errors
    if lat == 0 and lng == 0:
        continue
    price = df.iloc[i]["resale_price"]
    normalized_price = df.iloc[i]["normalized"]
    # assign a colour based on the position of the value in the range of 0 to 1
    fc = "#%02x%02x%02x" % tuple(map(lambda x: int(x * 255), colorsys.hsv_to_rgb(normalized_price,1,1)))
    folium.RegularPolygonMarker(
        [lat, lng],
        popup="$" + str(int(price)),
        fill_color=fc,
        number_of_sides=4,
        radius=5
        ).add_to(sg_map)
  0%|          | 0/18825 [00:00<?, ?it/s]
  0%|          | 1/18825 [00:00<1:51:30,  2.81it/s]
  0%|          | 2/18825 [00:00<1:29:54,  3.49it/s]
  0%|          | 3/18825 [00:00<1:12:24,  4.33it/s]
  0%|          | 4/18825 [00:00<1:00:30,  5.18it/s]
  0%|          | 5/18825 [00:00<52:15,  6.00it/s]  
  0%|          | 6/18825 [00:00<48:33,  6.46it/s]
  0%|          | 7/18825 [00:01<44:36,  7.03it/s]
  0%|          | 8/18825 [00:01<43:52,  7.15it/s]
  0%|          | 10/18825 [00:01<39:47,  7.88it/s]
  0%|          | 11/18825 [00:01<37:56,  8.27it/s]
  0%|          | 13/18825 [00:01<35:51,  8.75it/s]
  0%|          | 14/18825 [00:01<34:44,  9.02it/s]
  0%|          | 15/18825 [00:01<34:28,  9.09it/s]
  0%|          | 16/18825 [00:01<34:05,  9.20it/s]
  0%|          | 17/18825 [00:02<33:37,  9.32it/s]
  0%|          | 19/18825 [00:02<32:21,  9.69it/s]
  0%|          | 20/18825 [00:02<33:07,  9.46it/s]
100%|██████████| 18825/18825 [34:36<00:00, 10.23it/s] 
Out[28]:

Investigating geocoding errors

In [123]:
df.iloc[676]
blk = df.iloc[676]["block"]
streetname = df.iloc[676]["street_name"]
geocode(blk, streetname)
https://developers.onemap.sg/commonapi/search?searchVal=15%20ST.%20GEORGE'S%20RD&returnGeom=Y&getAddrDetails=Y&pageNum=1
[]
Out[123]:
(0, 0)

Most of the errors happen for "St. George's Rd" which isn't in the onemap api.

Saving the map as a html

In [29]:
sg_map.save("test_ppsm.html")
In [ ]: